10/11/2018

Overview

Objectives

  • Why manage code?
  • Why manage products?
  • What is version control?
  • Introducing tools for version control.
  • A hands-on example

Code Management

Goals:

  • Track changes over time
  • Avoid issues from you or collaborators working on the "wrong" file
    • e.g. I've been working on FinalAnalysisFinalFinal15.R only to find out that you have been editing FinalAnalysisFinal17.R
  • Ensure that future researchers can reproduce your analysis

Product Management

What are products?

  • Intermediate analysis objects
    • Complex statistical models
    • Visualization
    • Simulated or manipulated data (on which note, use set.seed(n) for reproducibility!)

Goals:

  • Maybe your workflow for simulating or editing the raw data changes – track that!
  • Perhaps you want to do secondary analyses based on a primary data product that arises from manipulated data – track that!

Caveats:

  • Different people advocate for different approaches
  • I have tended to track changes to code that generates these objects
  • Some advocate for tracking both the code and the objects (e.g. serialized .rda or .RData files, or .eps or .jpeg figures)

Version Control

Different ways to keep track of files

  • Don't do it
  • Numbered files
  • Formal version control

What might happen?

  • Don't do it
  • Numbered files
    • The collaborator dilemma - or realizing you edited the wrong file
  • Formal version control
    • More easily compare changes over time
    • Can revert to previous versions
  • We will focus on Git + GitHub
    • Relatively user friendly and huge user base
    • Many resources for this toolkit.
    • With GitHub - harder (if not impossible) to lose your work

Relevant history of Git

  • Developed in 2005 by Linus Torvalds to manage development of the Linux kernel
  • Why would you need version control for Linux developers?
    • 1000s of programmers, each needing local access to entire Linux distribution
    • Ability to track and (painlessly, or at least less painfully) share changes to code with others
  • Adoption by scientific community
    • Used for open-source software development (e.g. ggplot2, scikit-learn)
    • Large collaborative enterprises (Hadron Collider, Big Data Genomics, rOpenSci)

Features of Git

  • History of changes
  • Able to go back (revert to previous versions)
  • Less likely to break "production code"
  • Merging changes from multiple people

Rough organizational sketch:

  • Files –> Folders –> Repositories
  • Each repository is akin to a distinct project or related projects

Features of GitHub

  • A remote server that tracks your work and provides web hosting for project pages
  • Provides a graphical user interface
  • Facilitates collaboration
    • GitHub issues - Not only can people flag problems, they can even submit suggestions for corrections!
    • Forking repositories and web hosting

Commit in Git

"The fundamental unit of work in Git is a commit. A commit takes a snapshot of your code at a specified point in time. Using a Git commit is like using anchors and other protection when climbing. If you’re crossing a dangerous rock face you want to make sure you’ve used protection to catch you if you fall. Commits play a similar role: if you make a mistake, you can’t fall past the previous commit. Coding without commits is like free-climbing: you can travel much faster in the short-term, but in the long-term the chances of catastrophic failure are high! Like rock climbing protection, you want to be judicious in your use of commits. Committing too frequently will slow your progress; use more commits when you’re in uncertain or dangerous territory. Commits are also helpful to others, because they show your journey, not just the destination."

-Hadley Wickham

Using Git locally

Local Git exercises

  • Make a directory locally
  • Perform git init
  • Create and edit a file
  • Stage it
    • Realize you want to make some changes
  • Reset and inspect outputs
  • Tell Git to ignore undesired file types (e.g. log files like .Rout or static files like .pdf)
  • Commit the edited file

Break

  1. GitHub account setup and Git installation
  2. QUBES account setup and Git configuration

Git Logic

How Git is tracking your files

Changes over time

Hands on example

  • Make a new scratch directory
  • $ git init; $ ls -la; navigate to $ cd .git; navigate back $ cd ..
  • Create and edit a toy code script (foxy.R)

  • $ git diff
  • Stage it; $ git diff --cached
    • Oh no! I wanted some examples in there!
  • Perform R CMD BATCH foxy.R or Rscript foxy.R
    • There are static docs I don't want!
  • Add a .gitignore
  • Commit the .gitignore and your changed code file

Introducing GitHub

Git + GitHub

Solo example from before

When you have an existing folder with existing projects

  • Navigate to GitHub
  • Create an empty repository
    • Could name it GitR
  • Obtain URL for git clone command
  • Make a new, empty folder
  • Clone it into this new empty folder
  • Move files over into this folder
  • Git add, etc.

Solo example: Version 2

Initializing a new folder for a new project

  • Example: new R package, new set of analyses for a project, new manuscript
  • Make a repository on GitHub
  • Clone it to a new empty folder
  • Make files and use version control!

Group exercises

Branching

Recreate the following:

Version control relay race!

  • Write code to
    • Clean up a toy dataset
    • Perform a linear model
    • Recreate the plot in the previous slide
    • Recreate this plot with a different style (e.g. pch=...) or plotting library (ggplot2 for instance).
  • Create a new repository under the EEB504 organization: https://github.com/EEB504
  • Use version control collaboratively to:
    • Generate one final, clean version of the code with the group's favorite single figure
  • Hint: Use $git push to share your changes with the group; $ git fetch to pull in changes, and $ git merge to combine the different pieces of code across the collaborators in each group.

Data wrangling and version control example:

Hosting websites on GitHub

GitHub Pages

  • GitHub can host a website for any project associated with an existing repository
  • Usually the website is built from .html and related files housed in a gh-pages or docs directory
    • The other option is master, but that is quite messy
  • For this course, I used docs

GitHub Website Demo

  • (For a new project:) Navigate to GitHub and initiate a new repository
    • repo name: MischiefManaged, select R .gitignore
    • In the case of this class, we will be building in the existing docs folder
  • On your local machine, navigate to the desired (new) directory and perform git clone
  • Create a gitignore if you didn't already have one
  • mkdir docs; touch _site.yml; touch index.Rmd; touch about.Rmd

Building page architecture

atom _site.yml

name: "Padfoots-website"
    output_dir: "."
    navbar:
        title: "Padfoot's Website (Mischief Managed)"
        left:
          - text: "Home"
          href: index.html **this must have the same filename as your .Rmd file**
          - text: "About Me"
          href: about.html

Adding content to the pages

atom index.Rmd

---
title: "Padfoot's Website (Mischief Managed)"
---

Hello, World!

atom about.Rmd

---
title: "All about Padfoot"
---

Marauder's Map

Building the site

$ touch build_site.R

Git hosting

  • git add A
    • Generally we don't endorse git add ALL; use this only when necessary
  • git commit -m "First website version"
  • git push -u origin master

Group Website Exercise

Collaborating to build a website

  • Time to build your own website!
  • Think about a website you would be interested in building:
    • A personal website
    • A project website
    • Lab website
  • Develop your own website and work with your group members (and us!) to solve any problems that arise

Local Git usage

For future reference

Local machine

Local machine

Local machine

Local machine

Local machine

Local machine

Editing local file

  • I staged the file (using git add foxy.R)
  • but now I want to edit it

Edited local file

Ignore some of my files

  • We often do not want to commit "static" files (e.g. .pdf, .jpeg)

Finally ready to commit

  • I modified foxy.R and now I'm ready to track this version
  • I realized that the version I staged before did not have the minimum working changes I wanted

Voila! (End of local example)

Setting up QUBESHub

Setting up QUBESHub

Setting up QUBESHub

Setting up QUBESHub

Setting up QUBESHub